Sequence similarity searches on the World Wide Web.
نویسنده
چکیده
If you have just determined the sequence of an interesting bit of DNA, one of the first questions you are likely to ask yourself is “has anybody else seen anything like this before?” Fortunately, there has been a very successful international effort to collect all known sequences (both DNA and protein) into databases so that they can be searched. These databases (and searching tools) are freely available on the Web. The problem is that these databases are HUGE and, as a result, you must compare your sequence with the vast number of other sequences. Sequence comparison is the most powerful and most reliable method of answering biological questions about the evolutionary relationships between genes. Recent improvements in the statistical methods used by similarity searching computer programs allow the biologist to make conclusions with a high level of certainty. Now that the entire genomes of some organisms have been completely sequenced, it is possible to use similarity-search methods to provide definitive answers to questions such as “Is there a copy of this interesting gene in this organism?” A database search is a similarity search, but it is frequently— although incorrectly—referred to as homology searching. The term “homology” implies a common evolutionary relationship between two traits. Just because two sequences share a stretch of nearly identical nucleotides (or amino acids [aa]) does not mean that they are directly descended from a common ancestor. Homologous proteins share a common 3-dimensional structure, while proteins with a chance similarity of aa sequence do not. Over evolutionary time, two homologous proteins may diverge to the point that sequence similarity cannot be detected, yet they may still retain their common structure. Homologous proteins generally have similar biological functions, but this is not a rigorous requirement; conversely, discovery of homology is not proof of function. Of course, a high level of similarity is a strong indication of homology. As a rule of thumb, 25% identity over a stretch of 100 aa can be considered to be good evidence of common ancestry for two sequences. Homologous sequences are usually similar over an entire sequence, or sometimes just over one functional domain, but it is never correct to state that two sequences are “50% homologous.” Homology is an all or nothing decision. Be careful in asserting homology between short regions of two sequences; matches that are more than 50% identical in a 20–40 aa region can occur by chance. At present, the two most popular programs for similarity searching are Basic Local Alignment Search Tool (BLAST) (1) and FASTA (3). They use similar approaches to reach similar answers; although, there are some subtle differences. The two programs will generally give slightly different results: in terms of which sequences are found to be similar, their relative rankings by similarity score and the statistical significance that is assigned to those similarity scores. It is generally thought that FASTA is more sensitive for DNA-DNA comparisons, but BLAST is usually faster. The choice of which program to use is often governed by factors of convenience rather than mathematical rigor; but to be thorough, it is best to perform searches with both programs. For most researchers, the Web has become the best method to make similarity searches. Web servers are available to everyone, they have the most current data, they are usually fast and your computer is not tied up while your search is computing. However, the addresses and the services offered by various Web servers (See BioBit) are subject to change without warning, so it is wise to have some alternatives lined up when your computing is dependent on the kindness of strangers. Protein similarity searches can find much more distant similarities than comparisons of DNA sequences (ca. 2.5 billion vs. 100 million years of evolutionary divergence). This is true for several
منابع مشابه
RIDOM: Ribosomal Differentiation of Medical Micro-organisms Database
The ribosomal differentiation of medical micro-organisms (RIDOM) web server, first described by Harmsen et al. [Harmsden,D., Rothganger,J., Singer,C., Albert,J. and Frosch,M. (1999) Lancet, 353, 291], is an evolving electronic resource designed to provide micro-organism differentiation services for medical identification needs. The diagnostic procedure begins with a specimen partial small subun...
متن کاملSyntactic Clustering of the Web
We have developed an efficient way to determine the syntactic similarity of files and have applied it to every document on the World Wide Web. Using this mechanism, we built a clustering of all the documents that are syntactically similar. Possible applications include a "Lost and Found" service, filtering the results of Web searches, updating widely distributed web-pages, and identifying viola...
متن کاملBEAUTY: an enhanced BLAST-based search tool that integrates multiple biological information resources into sequence similarity search results.
BEAUTY (BLAST enhanced alignment utility) is an enhanced version of the NCBI's BLAST data base search tool that facilitates identification of the functions of matched sequences. We have created new data bases of conserved regions and functional domains for protein sequences in NCBI's Entrez data base, and BEAUTY allows this information to be incorporated directly into BLAST search results. A Co...
متن کاملAdvanced Similarity Searches on the Web: Gapped BLAST, PSI- BLAST, FASTA 3.0 and INCA
In the ever changing world of bioinformatics, the two most popular programs for sequence similarity searching, Basic Local Alignment Search Tool (BLAST) and FASTA, have both recently been improved. BLAST Version 2.0 is now available at the National Center for Biotechnology Information (NCBI) Web site, and FASTA 3.0 is available both as free software for most computer systems and on several Web ...
متن کاملWebPACADE: A System for the Analysis of Structural Similarity of Protein via WWW
This paper describes on a developmental system WebPACADE which is an extended version of a deductive database system PACADE for the analysis of protein 3D structure. It enables structural similarity searches on various proteins in the level of secondary structure. Results of a similarity search can be displayed graphically in a WWW browser.
متن کاملExPASy: the proteomics server for in-depth protein knowledge and analysis
The ExPASy (the Expert Protein Analysis System) World Wide Web server (http://www.expasy.org), is provided as a service to the life science community by a multidisciplinary team at the Swiss Institute of Bioinformatics (SIB). It provides access to a variety of databases and analytical tools dedicated to proteins and proteomics. ExPASy databases include SWISS-PROT and TrEMBL, SWISS-2DPAGE, PROSI...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- BioTechniques
دوره 24 2 شماره
صفحات -
تاریخ انتشار 1998